A Pareto model for OLAP view size estimation

نویسندگان

  • Thomas P. Nadeau
  • Toby J. Teorey
چکیده

On Line Analytical Processing (OLAP) aims at gaining useful information quickly from large amounts of data residing in a data warehouse. To improve the quickness of response to queries, preaggregation is a useful strategy. However, it is usually impossible to pre-aggregate along all combinations of the dimensions. The multidimensional aspects of the data lead to combinatorial explosion in the number and potential storage size of the aggregates. We must selectively pre-aggregate. Cost/benefit analysis involves estimating the storage requirements of the aggregates in question. We present an original algorithm for estimating the number of rows in an aggregate based on the Pareto distribution model. We test the Pareto Model Algorithm empirically against three published algorithms, and conclude the Pareto Model Algorithm is consistently the best of these algorithms for estimating view size.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تخمین احتمال بزرگی زمین‌لغزش‌های رخ‌داده در حوزه آبخیز پیوه‌ژن (استان خراسان رضوی)

Knowing the number, area, and frequency of landslides occurred in each area has a prominent role in the long-term evolution of area dominated by landslides and can be used for analyzing of susceptibility, hazard, and risk. In this regard, the current research is trying to consider identified landslides size probability in the Pivejan Watershed, Razavi Khorasan Province. In the first step, lands...

متن کامل

Size Estimation of Olap Systems

Software size estimation at early stages of project development holds great significance to meet the competitive demands of software industry. Software size represents one of the most interesting internal attributes which has been used in several effort/cost models as a predictor of effort and cost needed to design and implement the software. The whole world is focusing towards object oriented ...

متن کامل

Estimation and Reconstruction Based on Left Censored Data from Pareto Model

In this paper, based on a left censored data from the twoparameter Pareto distribution, maximum likelihood and Bayes estimators for the two unknown parameters are obtained. The problem of reconstruction of the past failure times, either point or interval, in the left-censored set-up, is also considered from Bayesian and non-Bayesian approaches. Two numerical examples and a Monte Carlo simulatio...

متن کامل

Bayesian Estimation for the Pareto Income Distribution under Asymmetric LINEX Loss Function

The use of the Pareto distribution as a model for various socio-economic phenomena dates back to the late nineteenth century. In this paper, after some necessary preliminary results we deal with Bayes estimation of some of the parameters of interest under an asymmetric LINEX loss function, using suitable choice of priors when the scale parameter is known and unknown. Results of a Monte C...

متن کامل

A Non-Linear Cost Model for Multi-Node OLAP Systems

Answering performance to business queries, mainly of aggregated nature, known as On-Line Analytical Processing queries, depends heavily on the proper selection of multidimensional structures, known as materialized subcubes or views. As user’s queries profiles change, these structures have to be recalibrated, once elected the new appropriated selection through a cube view selection algorithm. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001